Mental illness is one of the leading causes of disability in the world. In a report evaluating US Health from 1990-2010, the disease burden of mental illness is among the highest of all diseases [1]. Disease burden refers to the impact of a health problem measured by financial cost, mortality, morbidity, and other factors. Most patients with serious mental diseases or disorders spend years struggling without the ability to live a normal life. It is a big burden both for the patients, the family and our society.
New York City is facing the same problem with high rate of mental illness. Different studies have indicated possible relationship between urban life and higher risk of mental illness. In 2015, New York City launched an action plan called “thriveNYC”[2], aiming to change the way people think about mental health and provide more accessible services citywide.
The goals of my project here are:
To gain insights of the mental health situation in NYC;
To discover potentially effective interventions and provide guidance for the distribution of fundings and services of NYC to finally improve new yorkers’ mental health.
The data I am using is from Health Data NY, Statewide Planning and Research Cooperative System (SPARCS)[3]. The raw data includes the NY hospitalization inpatient discharges for all diseases from 2009-2014.
This file has 3 sections.
Section 1: Explore the 2014 dataset and discovery interesting patterns.
Section 2: Explore the combined dataset from 2009-2014; Analyze the econimic burdens caused by mental diseases in NYC.
Section 3: Three Final plots.
References:
[1] US Burden of Disease Collaborators. The state of US health, 1990-2010: burden of diseases, injuries, and risk factors. JAMA, 310(6): 591-608, 2013.
[2] https://thrivenyc.cityofnewyork.us/
[3] https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/mpue-vn67
——————————————
df14 <-
read.csv("./Data/Hospital_Inpatient_Discharges__SPARCS_De-Identified___2014.csv")
Based on the diagnosis of mental diseases and disorders, DRG code is used to extract mental diseases and disorders hospilization in the dataset. Neurological diseases and Drug & Alcohol abuses are excluded from the dataset.
Details of DRG code used are listed here: https://github.com/super-penguin/SPARCS-health-data.
# Subset the Data
# Subset the inpatient hospilization for Mental Diseases & Disorders of NYC in 2014
# The subset is based on DRG code
Mental.Code<- c(740, 750, 751, 752, 753, 754, 755, 756, 758, 759, 760, 561, 766)
County <- c("Bronx","Kings", "Manhattan", "Queens", "Richmond")
df14.NYC <- subset(df14, Hospital.County %in% County)
df14.mental<- subset(df14.NYC, APR.DRG.Code %in% Mental.Code)
# Convert the cost and charge ($) into integer for further exploration
df14.NYC$Total.Charges<- destring(df14.NYC$Total.Charges)
df14.NYC$Total.Costs<- destring(df14.NYC$Total.Costs)
df14.mental$Total.Charges<- destring(df14.mental$Total.Charges)
df14.mental$Total.Costs<- destring(df14.mental$Total.Costs)
# Group dataset by DRG code and sum patients number for each disease
df14.NYC.DRG.Group<- df14.NYC %>%
group_by(APR.DRG.Code, APR.DRG.Description) %>%
summarise(Total.Patients.Number = n()) %>%
arrange(Total.Patients.Number)
In this figure, Schizophrenia and Bipolar Disorders are both belong to mental disorders. Among the top 10 diseases, two of them are mental disorders and Schizophrenia is the third most common one. It indicates the importance of understanding mental health situation in NYC.
Schizophrenia, Bipolar Disorders and Major Depressive Disorders are the TOP 3 most common mental illnesses in NYC 2014.
By comparing the emergent addmission rate of top 10 diseases in NYC, Schizophrenia and Bipolar Disorders are not the highest, but they all lie in the higher range (around 70%).
Emergency admission rate of mental diseases & disorders implies the importance of early action on the road to improve mental health. Improving early counseling services and early responding team might be an effective way to provide patients with necessary help and prevent it from getting worse.
The averaged total charge of Schizophrenia is the third highest among the top 10 diseases. It is a huge financial burden both to the patients’ family and our city.
There is not much difference between the distribution of total changes and costs in all other disceases compared with mental diseases.
Compared with the distribution of all disease in NYC, the hospitalization length for mental diseases distribute more toward longer stay. The peak distribution is smimilar with other diseases, but more cases for mental diseases are toward longer period.
The hospitalization length of Schizophrenia and Biopolar Disorders are both in the higher range. Actually, this figure might not represent the actual long term burden of mental diseases. In fact, most patients still need extra care at home or specific facilities after discharging from the hospital.
In this chapter, the severity of mental disorders are compared with the top 10 diseases in NYC on different aspects. In 2014, schizophrenia alone was already the thrid leading cause of patient hospitalization in NYC. Besides the shocking number of patients with mental problems, the high charges and long hospitalization duration are heavy burden both to the patients and our city. Most patients with severe mental disorders lose the ability to work and live by themselves for years or even a lifetime. Extra care and cost is needed constantly.
From those figures, it is clear that mental health is one of the urgent problems to our city and effective data sharing should be coordinated to come up with new strategies. We are going to focus on exploring hospitalization data of mental diseases in NYC 2014 for the next part.
df14.fc_by_age_race <- df14.mental %>%
filter(Race != "Multi-racial") %>%
group_by(Age.Group, Race, Gender) %>%
summarise(mean_days = mean(as.numeric(Length.of.Stay)),
mean_costs = mean(as.numeric(Total.Costs)),
n = n()) %>%
arrange(Age.Group)
There are more patients with mental disease from age 18-69. This trend corresponds to the population age distribution. For the gender difference, male are more likely to suffer from mental diseases.
There seems to have a significant racial difference in patients with mental diseases. Black/African seems more likely to suffer from mental diseases. However, no conclusion can be drawn without normalizing the patient number to different racial population.
There is a significant racial difference for mental diseases and disorders hospitalization in NYC. The percentage of Black/African American with mental problems is almost two times compared with other races. Statistical analysis will be performed for all the data from 2009-2014 on racial difference in Section 2.
df14.fc_by_age_disease <- df14.mental %>%
group_by(Age.Group, APR.DRG.Description) %>%
summarise(mean_days = mean(as.numeric(Length.of.Stay)),
mean_costs = mean(as.numeric(Total.Costs)),
sum_costs = sum(as.numeric(Total.Costs)),
n = n()) %>%
arrange(Age.Group)
df14.fc_by_race_disease <- df14.mental %>%
group_by(Race, APR.DRG.Description) %>%
filter(Race != "Multi-racial") %>%
summarise(mean_days = mean(as.numeric(Length.of.Stay)),
mean_costs = mean(as.numeric(Total.Costs)),
sum_costs = sum(as.numeric(Total.Costs)),
n = n()) %>%
arrange(Race)
All these explorations reveal the high economic burden of Schizophrenia in Black/African American to NYC in 2014. We are going to explore the econimic burdern by geographic region of NYC in Section 2.
The plot of hospitalization admission date has an interesting pattern. The number of patients admitted are much higher during weekday compared with weekend. The trend goes up from Monday to Wednesday and down from Wednesday to Friday. Then it drops significantly on Saturday and keeps going lower on Sunday. This Interesting trend matches the working pressure during our daily life. It indicates that mental illness is highly likely to be triggered by work and study pressure in NYC.
Racial and Gender differences are two important factors in 2014 dataset. After the exporalization, we are going to process and analyze all the datasets from 2009 to 2014. In the next section, we are going to focus on three factors: race, gender and geographic region of NYC.
The datasets from 2009 to 2014 have been cleaned similarly as “df14.mental”, and combined in one data file: NYC2009_2014_inpatient_discharge_mental.csv
Save data description into file: “NYC.mental.data_dictionary.txt” with Mean and other informaitons
## Df Sum Sq Mean Sq F value Pr(>F)
## Race 2 2933531 1466765 3.120 0.0445 *
## Hospital.County 4 13639464 3409866 7.254 9.08e-06 ***
## Gender 2 1547778 773889 1.646 0.1932
## Race:Hospital.County 8 6916519 864565 1.839 0.0661 .
## Race:Gender 2 220224 110112 0.234 0.7912
## Hospital.County:Gender 4 828332 207083 0.441 0.7793
## Race:Hospital.County:Gender 8 390637 48830 0.104 0.9991
## Residuals 1138 534911505 470045
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion: There are statistically significant differences between Race and Hospital.County in our dataset. Next, we are going to plot them seperately and perfrom post-hoc analysis.
## Df Sum Sq Mean Sq F value Pr(>F)
## Race 2 2933531 1466765 3.062 0.0471 *
## Residuals 1166 558454460 478949
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Patients.Number ~ Race, data = NYC.mental.new)
##
## $Race
## diff lwr upr p adj
## Other Race-Black/African American -74.82015 -192.5655 42.92520 0.2954996
## White-Black/African American -121.43272 -237.2946 -5.57088 0.0373768
## White-Other Race -46.61257 -162.2383 69.01317 0.6112210
## Df Sum Sq Mean Sq F value Pr(>F)
## Hospital.County 4 254240272 63560068 38.79 <2e-16 ***
## Residuals 85 139263381 1638393
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Total.Patients ~ Hospital.County, data = NYC.fc_by_race_county)
##
## $Hospital.County
## diff lwr upr p adj
## Kings-Bronx 2298.7222 1109.5199 3487.9246 0.0000061
## Manhattan-Bronx 3101.0000 1911.7977 4290.2023 0.0000000
## Queens-Bronx 647.5000 -541.7023 1836.7023 0.5540743
## Richmond-Bronx -1640.0000 -2829.2023 -450.7977 0.0021225
## Manhattan-Kings 802.2778 -386.9246 1991.4801 0.3357701
## Queens-Kings -1651.2222 -2840.4246 -462.0199 0.0019418
## Richmond-Kings -3938.7222 -5127.9246 -2749.5199 0.0000000
## Queens-Manhattan -2453.5000 -3642.7023 -1264.2977 0.0000013
## Richmond-Manhattan -4741.0000 -5930.2023 -3551.7977 0.0000000
## Richmond-Queens -2287.5000 -3476.7023 -1098.2977 0.0000068
Conclusion: There are significant racial and county differences in the utilization of inpatient mental health service in NYC.
Load NYC map at county level.
## OGR data source with driver: ESRI Shapefile
## Source: "nybb_16c", layer: "nybb"
## with 5 features
## It has 4 fields
There is a significant racial difference for mental diseases and disorders hospitalization in NYC.
From 2009 to 2014, Black/African Americans have the highest mental disorder hospitalizations in almost every age group compared with other racial groups. The ANOVA in section 2 also showed that Black/African have significant higher rate of mental disease problems compared with white (p=0.0373768*). This final figure 1 showed the percent differences of patients with mental diseases in each ratial group. This figure excluded the influence of different racial populations in NYC and presented the shocking differences directly. In summary, Black/African American has 0.6% higher population rate with mental diseases in NYC compared with White.
I researched for potential reasons. One possible explanation is genetic difference. However, I didn’t find much evidence to support this assumption. Another possible reason is the bias in mental disorder diagnosis, which means one race is more likely to be diagnosed with severe mental disorders. There are some studies showing that a Black/African American is more likely to be diagnosed as schizophrenia with the same symptom when a White American is diagnosed as depression. However, this observation does not explain my results since I grouped all those possible mental diseases together. I will explore the data further to see if I could come up with a reasonable explanation for racial difference.
In Final plot 2, the number of patients admitted are much higher during weekday compared with weekend. The trend goes up from Monday to Wednesday and down from Wednesday to Friday. Then it drops significantly on Saturday and keeps going lower on Sunday. This Interesting trend matches the working pressure during our daily life. It indicates that mental illness is highly likely to be triggered by work and study pressure in NYC.
This result is not surprising. The condensed population and high living pressure in manhattan might be the leading cause for this difference. Based on this observation, more fundings and services should be distributed in manhattan to improve mental health of new yorkers.
In addition, since the data sets donnot have complete patient zip code information, I used the hospital county information to analyze the economic burdens on county level. I wonder would this be caused by the density and compacity of mental hospitals in Manhattan compared with other counties.
The gender difference is another interesting observation. I was debating if I should include Maternal Depression into the total mental health data, since it might cause gender bias in the final results. However, even if I included Maternal Depression, male adults still have much higher hospilization rate with mental problems. It also indicates that work and family pressure might be one of the leading cause to induce mental problems in NYC.
This dataset is limited in many ways. First, patients hospilization infomation in this dataset does not account for the readmission. Patients with mental diseases and disorders have a high readmission rate, but when I am analyzing this dataset, the readmission rate is missing. Bias might be induced by this missing factor and interesting observation might be ignored without the consideration of readmission.
In addition, due to the confidential problem, the zip code information for patients are not complete (a lot missing values). For these data points that have patients’ zip code information, they can only be showed for the first 3 digits, which makes it impossible to map the mental health profile into community level. New york city is a large and ethnically diverse metropolis. Analysis on county level does not provide enough information refecting the health situation when considering the diverse demographic characteristics of each community. I will continue this project with more detailed data and hopefully to map a better NYC mental health profile.
Future work: anlayze the mental hospital location and capacity in NYC at county level. Hopefully to get a better idea about the funding distributions in NYC for mental health problems.